-
-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix] complexity_coarsegraining(): fix method #892
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
# signal, size=scale, mode="nearest" | ||
# ) | ||
# coarse = coarse[scale - 1 : :] | ||
coarse = complexity_embedding(signal, dimension=scale, delay=1).mean(axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the coarsegraining procedure but it doesn't seem to have entirely solved the issue:
@hsyu001 let's just make sure that we have the same sample entropy results: would you mind computing SampEn with these parameters using your own algorithm:
signal = [1, 2, 3, 5, 3, 1, 2, 4, 5, 7, 3, 2, 6, 2, 4, 8, 2]
tol = 2
With NK, this gives:
nk.entropy_sample(signal, dimension=2, delay=3, tolerance=tol)
> (0.2831469172863898,
{'Dimension': 2, 'Delay': 3, 'Tolerance': 2.0034572195207527})
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
So I digged down a bit more and the difference between our codes is related most likely to different Phis computation: import neurokit2 as nk
import numpy as np
# Your MSE code isolated
def custom_mse(signal, m, delay, tol):
N = len(signal)
Nn = 0
Nd = 0
for i in np.arange(0, N - (m + 1) * delay).reshape(-1):
for j in np.arange(i + delay, N - m * delay).reshape(-1):
if (np.abs(signal[i] - signal[j]) < tol) and (np.abs(signal[i + delay] - signal[j + delay]) < tol):
Nn += 1
if abs(signal[i + m * delay] - signal[j + m * delay]) < tol:
Nd += 1
return -np.log(Nd / Nn), [Nd, Nn]
signal = [1, 2, 3, 5, 3, 1, 2, 4, 5, 7, 3, 2, 6, 2, 4, 8, 2]
delay=1
m = 2
tol = 2
rez, info = nk.entropy_sample(signal, dimension=m, delay=delay, tolerance=tol)
rez, info["phi"]
> (0.4946962418361073, array([0.39047619, 0.23809524]))
custom_mse(signal, m, delay, tol)
> (0.916290731874155, [6, 15]) Which likely finds its origin in the number of counts. In NK we use: NeuroKit/neurokit2/complexity/utils.py Line 163 in 366583e
In any case, I have updated this branch so that rez, info = nk.entropy_sample(signal, dimension=m, delay=delay, tolerance=tol)
rez, info["phi"]
# This is how we compute the phi in NK
phi = [np.mean((info["count1"] - 1) / (info["embedded1"].shape[0] - 1)),
np.mean((info["count2"] - 1) / (info["embedded2"].shape[0] - 1))] Could you let me know if that looks correct to you, and any guess as to the origin of the difference and if there is an error somewhere |
Also tagging @CSchoel here to maybe gain some insights as to the source of SampEn difference |
Also MNE has a similar implementation it seems (but with a fixed r) |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
The source code is downloadable in a zip file |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as outdated.
This comment was marked as outdated.
Here's a summary and reformulation of the problem for future reference: NeuroKit, as well as quite a lot of other mainstream implementations, use a SampEn computation using a KDtree to query the nearest neighbors within the tolerance. Here is the minimal code to reproduce NK's sampen results def sample_entropy(signal, dimension, delay, tol):
m = nk.complexity_embedding(signal, dimension=dimension, delay=delay)[:-1]
m1 = nk.complexity_embedding(signal, dimension=dimension+1, delay=delay)
kdtree = sklearn.neighbors.KDTree(m, metric="chebyshev")
count1 = kdtree.query_radius(m, tol, count_only=True)
kdtree = sklearn.neighbors.KDTree(m1, metric="chebyshev")
count2 = kdtree.query_radius(m1, tol, count_only=True)
Nd = np.mean((count2 - 1) / (m1.shape[0] - 1))
Nn = np.mean((count1 - 1) / (m.shape[0] - 1))
return -np.log(Nd / Nn), [Nd, Nn]
# Test ---------
signal = [2, 4, 8, 16, 1, 3, 5, 7, 9, 11]
delay=1
dimension = 2
tol = 2
sample_entropy(signal, dimension, delay, tol)
> (0.40546510810816444, [0.14285714285714285, 0.21428571428571427]) However, this implementation behaves somewhat weirdly and not as expected (as shown by the multiscale pattern on white & pink noise). The following implementation, which gives different results (but expected ones given the above benchmark), is unfortunately very slow in base Python: def sample_entropy2(signal, m, delay, tol):
N = len(signal)
Nn = 0
Nd = 0
for i in np.arange(0, N - (m + 1) * delay).reshape(-1):
for j in np.arange(i + delay, N - m * delay).reshape(-1):
if (np.abs(signal[i] - signal[j]) <= tol) and (np.abs(signal[i + delay] - signal[j + delay]) <= tol):
Nn += 1
if abs(signal[i + m * delay] - signal[j + m * delay]) < tol:
Nd += 1
return -np.log(Nd / Nn), [Nd, Nn]
sample_entropy2(signal, dimension, delay, tol)
> (1.791759469228055, [1, 6]) The questions are:
@zen-juen maybe you can also ask some math experts iykwim |
Comparing NK vs. Loop-basedAlright I've managed to narrow down the problem I think. Long story short, the 2 version give different results for dimension > 2. Here's what GPT has to say about that:
if (np.abs(signal[i] - signal[j]) <= tolerance) and (np.abs(signal[i + delay] - signal[j + delay]) <= tolerance):
match = True
for d in range(dimension):
if np.abs(signal[i + d * delay] - signal[j + d * delay]) > tolerance:
match = False
break
if match:
Nn += 1
if np.abs(signal[i + dimension * delay] - signal[j + dimension * delay]) < tolerance:
Nd += 1
@hsyu001 what do you think about that? Denominator problem?EDIT: actually after thinking about it I think the original is correct. However, GPT4 flagged something else. For the NK implementation: He suggests changing the denominator from: Nd = np.mean((count2 - 1) / (m1.shape[0] - 1))
Nn = np.mean((count1 - 1) / (m.shape[0] - 1)) to Nd = np.mean((count2 - 1) / (len(signal) - dimension + 1))
Nn = np.mean((count1 - 1) / (len(signal) - dimension)) It doesn't affect a lot the results, but it's worth checking with a mathematician. |
Hi Dr. Makowski : |
Alright, after some more consideration, I've decided not to change the sampen function for now, as I couldn't find anything wrong with it per se. The reason why it gives different results when used with multiscale with coloured noise remains a mystery, and is worth continuing to explore. I'll go ahead and merge this PR that contains fixes to coarsegraining. |
Benchmark
Code
Shoud look like this:
But looks like this: